Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag remove preprocessor #640

Merged
merged 8 commits into from
Aug 8, 2017
Merged

Tag remove preprocessor #640

merged 8 commits into from
Aug 8, 2017

Conversation

mpacer
Copy link
Member

@mpacer mpacer commented Aug 7, 2017

This PR implements tag based element filtering using a Preprocessor to remove entire cells, entire output areas, and individual outputs.

Cell removal is enabled by adding a tag to tags in the cell's metadata and then adding the tag to TagRemovePreprocessor.remove_cell_tags.

Output area removal is enabled by adding a tag to tags in the cell's metadata and then adding the tag to TagRemovePreprocessor.remove_all_outputs_tags.

Individual output removal is enabled by adding a tag to a tags field created in an output's metadata, and then adding the tag to TagRemovePreprocessor.remove_single_output_tags.

I have an idea talked through with @Carreau on how to additionally filter inputs (which would create invalid notebook objects), but that can't be done with a preprocessor (since preprocessors can only produce valid notebooks). Therefore, I'll leave that to another PR.


remove_cell_tags = List(Unicode, default_value=[]).tag(config=True)
remove_all_outputs_tags = List(Unicode, default_value=[]).tag(config=True)
remove_single_output_tags = List(Unicode, default_value=[]).tag(config=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the order doesn't matter here, I think you could make them sets instead of lists. That means we can use set operations to check if any tag matches:

self.remove_cell_tags.intersection(cell.get('metadata', {}).get('tags', []))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if 'metadata' in cell:
for field in self.remove_metadata_fields:
cell.metadata.pop(field, None)
if cell.get('outputs', {}):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really matter to the code, but for the sake of clarity, I'd make the fallback value a list rather than a dict. Or maybe even None.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


def preprocess_output(self, output, resources,
cell_index, output_index):
return output, resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning some future extension here, or can this method be discarded?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was partially trying to give people a handle on preprocessing individual outputs if they wanted to subclass (I thought it might be helpful for things like tag based preprocessing… but I'm realising now that something like that is more broad ranging and should probably be put into a different PR).

@takluyver
Copy link
Member

This basically looks good, I've just noted a few places where I think the code can be clearer/simpler. :-)

@mpacer
Copy link
Member Author

mpacer commented Aug 7, 2017

Should we be considering a major release soon? I want to add a new class to handle the input version of this, so maybe we should have it target 6.0?

@takluyver
Copy link
Member

Historically with IPython we did feature releases like '6.0' and bugfix rollups like '6.1', but since the big split we've tended more towards semver style versioning, so we increment the major version to indicate breaking changes. To some extent this is up to the people working on each individual project, though.

I'm not the biggest fan of semver - I think it ignores the complexity of real development, where almost every change can break something - but I know that when we're aiming for a new .0 release, it tends to become a Big Deal, and everyone wants their thing backported to the previous series. So I'm inclined to leave 6.0 until we actually want to break something.

@mpacer
Copy link
Member Author

mpacer commented Aug 7, 2017

K @takluyver would you say this is good to go?

@mpacer
Copy link
Member Author

mpacer commented Aug 7, 2017

@Carreau @minrk I'll merge if either of you look it over and give a 👍.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants